Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Digitalizálás - a Digitalizált Törvényhozási Tudástár projekt tapasztalatai

Identifieur interne : 000212 ( Main/Exploration ); précédent : 000211; suivant : 000213

Digitalizálás - a Digitalizált Törvényhozási Tudástár projekt tapasztalatai

Auteurs : Boros Ildiko [Hongrie]

Source :

RBID : Pascal:14-0131397

Descripteurs français

English descriptors

Abstract

The Digitised Legislation Knowledge Store (DTT) project of the Library of the Hungarian Parliament was implemented in the period January 4, 2010 to November 30, 2012, as a priority project, in the framework of the Electronic Administration Operational Programme (EKOP), supported by the European Union and co-financed by the European Regional Development Fund. This article describes the workflow of digitisation. After creating the IT background the general technical parameters were defined, and the physical and logical presentation of digital documents started. This was followed by the workflow for mass uploading: METS/XML was prepared based on templates for various groups of materials, and plans for quality control were made using mathematical-statistical methods. Principles for the selection of works for digitisation were defined, with categories according to fields of science, document types, language, time, etc. Hungarian-language materials were selected from the main collecting areas of the Library (law, history and political science). The size, structure and condition of the volumes selected were registered on status sheets. Books were described individually, while journals, official gazettes and decisions were described in groups in a conservation status database. In order to select suitable copies it was necessary to identify the bibliographic, copy and publishing data in catalogues, and to check, improve and prepare for conversion the records of books and journals concerned, to enter their copyright status and collection organisation codes into the database. Metadata were prepared. The digitisation company was chosen within a public procurement process. Because of the value of works selected for digitisation, their uniqueness and in many cases irreplaceable nature, as well as preservation considerations, digitisation took place on site, mostly using a Kirtas KABIS III. robotic scanner. Large volumes and fold-out attachments were digitised by a flatbed scanner. The processing of the images, their cutting and correction was carried out by the Book Scan Editor (BSE) software. Metadata were assigned to each page of each volume by the GLOBE-Index software. This was followed by optical character recognition (OCR): after image processing the data were transferred into the OCR database, and the OCR Engine automatically created the final two-layer PDF file format. The pages prepared were then input into the DigiTool software in this format. The first step of checking was the automatic control of TIFF images. During manual inspection the general quality control of scanning results took place. Within the project two million pages have been digitised, the total number of volumes was 5272. The documents can be accessed in accordance with the copyright legislation in force. A major part of works (40%) is under copyright protection; consequently they can be displayed on the Library's computers only for scientific research or private study. The works that are not subject to copyright protection are available without any limitation to the public on the Internet. The DTT portal is barrier-free; those visually impaired can use it properly as well.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="HUN" level="a">Digitalizálás - a Digitalizált Törvényhozási Tudástár projekt tapasztalatai</title>
<author>
<name sortKey="Ildiko, Boros" sort="Ildiko, Boros" uniqKey="Ildiko B" first="Boros" last="Ildiko">Boros Ildiko</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Az Országgyülési Könyvtár Gyüjteményszervezési osztályának vezetöje</s1>
<s3>HUN</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Hongrie</country>
<wicri:noRegion>Az Országgyülési Könyvtár Gyüjteményszervezési osztályának vezetöje</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">14-0131397</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 14-0131397 INIST</idno>
<idno type="RBID">Pascal:14-0131397</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000020</idno>
<idno type="stanalyst">FRANCIS 14-0131397 INIST</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000036</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000744</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000055</idno>
<idno type="wicri:doubleKey">0041-3917:2013:Ildiko B:digitalizalas:a:digitalizalt</idno>
<idno type="wicri:Area/Main/Merge">000215</idno>
<idno type="wicri:Area/Main/Curation">000212</idno>
<idno type="wicri:Area/Main/Exploration">000212</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="HUN" level="a">Digitalizálás - a Digitalizált Törvényhozási Tudástár projekt tapasztalatai</title>
<author>
<name sortKey="Ildiko, Boros" sort="Ildiko, Boros" uniqKey="Ildiko B" first="Boros" last="Ildiko">Boros Ildiko</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Az Országgyülési Könyvtár Gyüjteményszervezési osztályának vezetöje</s1>
<s3>HUN</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Hongrie</country>
<wicri:noRegion>Az Országgyülési Könyvtár Gyüjteményszervezési osztályának vezetöje</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Tudományos és müszaki tájékoztatás : (Nyomtatott)</title>
<title level="j" type="abbreviated">Tud. müsz. táj. : (Nyomt.)</title>
<idno type="ISSN">0041-3917</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Tudományos és müszaki tájékoztatás : (Nyomtatott)</title>
<title level="j" type="abbreviated">Tud. müsz. táj. : (Nyomt.)</title>
<idno type="ISSN">0041-3917</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Hungary</term>
<term>Library</term>
<term>Project</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Hongrie</term>
<term>Projet</term>
<term>Bibliothèque</term>
</keywords>
<keywords scheme="Wicri" type="geographic" xml:lang="fr">
<term>Hongrie</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Bibliothèque</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The Digitised Legislation Knowledge Store (DTT) project of the Library of the Hungarian Parliament was implemented in the period January 4, 2010 to November 30, 2012, as a priority project, in the framework of the Electronic Administration Operational Programme (EKOP), supported by the European Union and co-financed by the European Regional Development Fund. This article describes the workflow of digitisation. After creating the IT background the general technical parameters were defined, and the physical and logical presentation of digital documents started. This was followed by the workflow for mass uploading: METS/XML was prepared based on templates for various groups of materials, and plans for quality control were made using mathematical-statistical methods. Principles for the selection of works for digitisation were defined, with categories according to fields of science, document types, language, time, etc. Hungarian-language materials were selected from the main collecting areas of the Library (law, history and political science). The size, structure and condition of the volumes selected were registered on status sheets. Books were described individually, while journals, official gazettes and decisions were described in groups in a conservation status database. In order to select suitable copies it was necessary to identify the bibliographic, copy and publishing data in catalogues, and to check, improve and prepare for conversion the records of books and journals concerned, to enter their copyright status and collection organisation codes into the database. Metadata were prepared. The digitisation company was chosen within a public procurement process. Because of the value of works selected for digitisation, their uniqueness and in many cases irreplaceable nature, as well as preservation considerations, digitisation took place on site, mostly using a Kirtas KABIS III. robotic scanner. Large volumes and fold-out attachments were digitised by a flatbed scanner. The processing of the images, their cutting and correction was carried out by the Book Scan Editor (BSE) software. Metadata were assigned to each page of each volume by the GLOBE-Index software. This was followed by optical character recognition (OCR): after image processing the data were transferred into the OCR database, and the OCR Engine automatically created the final two-layer PDF file format. The pages prepared were then input into the DigiTool software in this format. The first step of checking was the automatic control of TIFF images. During manual inspection the general quality control of scanning results took place. Within the project two million pages have been digitised, the total number of volumes was 5272. The documents can be accessed in accordance with the copyright legislation in force. A major part of works (40%) is under copyright protection; consequently they can be displayed on the Library's computers only for scientific research or private study. The works that are not subject to copyright protection are available without any limitation to the public on the Internet. The DTT portal is barrier-free; those visually impaired can use it properly as well.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Hongrie</li>
</country>
</list>
<tree>
<country name="Hongrie">
<noRegion>
<name sortKey="Ildiko, Boros" sort="Ildiko, Boros" uniqKey="Ildiko B" first="Boros" last="Ildiko">Boros Ildiko</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000212 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000212 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:14-0131397
   |texte=   Digitalizálás - a Digitalizált Törvényhozási Tudástár projekt tapasztalatai
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024